-
Notifications
You must be signed in to change notification settings - Fork 3.3k
fix(streaming): handle multi-byte UTF-8 chars split across chunks #3083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryThis PR fixes a critical bug where multi-byte UTF-8 characters (Turkish characters, emojis, CJK characters) were being corrupted during SSE streaming when split across HTTP chunk boundaries. Key Changes:
The fix correctly handles both problems:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant readSSEStream
participant TextDecoder
participant Buffer
participant Callbacks
Client->>readSSEStream: body.getReader()
readSSEStream->>TextDecoder: new TextDecoder()
loop For each chunk
readSSEStream->>readSSEStream: reader.read()
alt Chunk received
readSSEStream->>TextDecoder: decode(value, { stream: true })
Note over TextDecoder: Maintains state for<br/>incomplete UTF-8 sequences
TextDecoder-->>Buffer: Decoded text
Buffer->>Buffer: Split by '\n\n'
Buffer->>Buffer: Keep incomplete message
loop For each complete SSE message
Buffer->>Buffer: Extract "data: " content
Buffer->>Buffer: JSON.parse(lineData)
alt Valid chunk
Buffer->>Callbacks: onChunk(data.chunk)
Buffer->>Callbacks: onAccumulated(accumulatedContent)
end
end
else Done
readSSEStream->>TextDecoder: decode()
Note over TextDecoder: Flush any remaining bytes
TextDecoder-->>Buffer: Final text
end
end
readSSEStream-->>Client: accumulatedContent
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, no comments
Summary
{ stream: true }to TextDecoder to maintain state across chunksFixes #3068
Type of Change
Testing
Checklist